CLAIMS 

[1] A program conversion device for a processor which has an 
instruction set including an instruction that waits for a 
predeternnined response from an outside source when the 
5 instruction is executed, comprising: 

a loop structure transforming unit operable to perform double 
looping transformation so as to transform a structure of a loop, 
which is Included In an input program and whose iteration count is x, 
into a nested structure where a loop whose iteration count Is y is an 
10 inner loop and a loop whose Iteration count is x/y Is an outer loop; 
and 

an instruction placing unit operable to convert the input 
program into an output program including the instruction by placing 
the instruction in a position outside the inner loop. 

15 

[2] The program conversion device according to Claim 1, 
wherein said loop structure transforming unit includes: 
a loop detecting unit operable to detect a loop Included in the 

input program; 

20 an Iteration count detecting unit operable to detect an 

Iteration count of the detected loop; 

a response wait cycle count detecting unit operable to detect 
the number of response wait cycles which is the number of cycles to 
wait for the predetermined response when the Instruction is 

25 executed; 

a cycles-per-sequence detecting unit operable to detect the 
number of cycles per sequence required for one set of iteration 
processing of the detected loop; 

a loop splitting unit operable to split off, from the detected 
30 loop, a loop whose iteration count is derived from (the number of 
response wait cycles/the number of cycles per sequence); and 

a double looping transforming unit operable to perform 
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double looping transformation so as to build a nested structure 
where the loop whose iteration count is derived from (the number of 
response wait cycles/the number of cycles per sequence) is an inner 
loop and a loop whose iteration count is derived from (the iteration 
5 count of the detected loop/the iteration count of the inner loop) is an 
outer loop. 

[3] The program conversion device according to Claim 1, further 
comprising 

10 an optimization directive information receiving unit operable 

to receive optimization directive information which relates to 
optimization. 

[4] The program conversion device according to Claim 3, 
15 wherein said optimization directive information receiving unit 

is operable to receive a minimum iteration count of the loop included 

in the input program, 

said loop structure transforming unit is operable to, when an 

execution count of the loop is non-fixed, extract iteration processing 
20 having the minimum iteration count from the loop on the basis of the 

minimum iteration count and to perform double looping 

transformation on the extracted iteration processing of the loop. 

[5] The program conversion device according to Claim 1, 
25 wherein the instruction is an instruction that has a possibility 

of causing an interlock. 

[6] The program conversion device according to Claim 5, 

wherein the instruction that has a possibility of causing an 
30 interlock is a prefetch instruction for prefetching data from main 
memory to a cache. 
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[7] The program conversion device according to Claim 6, further 
comprising 

a scheduling unit operable to perform instruction scheduling, 
wherein said loop structure transforming unit is operable to 
5 split off, from the loop whose iteration count is x, a loop whose 
Iteration count is y and which is executed corresponding to the 
number of cycles required to execute the prefetch instruction, based 
on a result obtained by said scheduling unit, and operable to 
perform double looping transformation so as to build a nested 
10 structure where the loop whose iteration count Is y is an Inner loop 
and a loop whose iteration count Is x/y is an outer loop. 

[8] The program conversion device according to Claim 1, 

wherein after the instruction is executed, a plurality of cycles 
15 are required until a time comes when a predetermined resource will 
be referable. 

[9] The program conversion device according to Claim 8, 

wherein the instruction that requires the plurality is an 
20 instruction for accessing one of main memory and a cache. 

[10] The program conversion device according to Claim 1, 

wherein said loop structure transforming unit is operable to 
split off, from the loop whose iteration count is x, the loop whose 

25 Iteration count is y and which is executed In accordance with an 
advance in a cache line size made by an address of an array 
referenced within the loop whose iteration count is x, and operable 
to perform double looping transformation so that the loop whose 
iteration count is y is an inner loop and the loop whose iteration 

30 count is x/y is an outer loop- 

[11] The program conversion device according to Claim 10, 
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wherein when a plurality of arrays are present, said loop 
structure transforming unit is operable to further perform, in 
accordance with the number of the arrays, proportional dividing 
transformation to proportionally divide the loop whose iteration 
5 count Is y and on which the double looping transformation has been 
performed. 

[12] The program conversion device according to Claim 11, 

wherein when sizes of array elements of the plurality of arrays 
10 are different, the loop whose iteration count is y is proportionally 
divided in the proportional dividing transformation in accordance 
with a ratio of the sizes. 

[13] The program conversion device according to Claim 11, 

wherein when each stride of the plurality of arrays is different, 
a stride referring to addresses advanced per set of the iteration 
processing of the loop, the loop whose iteration count is y is 
proportionally divided in the proportional dividing transformation in 
accordance with a ratio of the strldes- 

[14] The program conversion device according to Claim 11, 

wherein when an inner loop Is transformed, a conditional 
statement Is generated for each divided loop and the proportional 
dividing transformation is performed so that each divided loop is 
executed within a same Inner loop. 

[15] The program conversion device according to Claim 10, 

wherein when the loop whose iteration count is y is split off 
from the loop whose iteration count is x and a remainder z left over 
30 after a calculation of x/y Is not zero, said loop structure 
transforming unit is operable to perform peeling processing and 
then double looping transformation on iteration processing that is to 
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be executed z number of times. 

[16] The program conversion device according to Claim 15, 

wherein when the remainder z is not zero, said loop structure 
5 transforming unit is operable to generate a conditional statement for 
judging whether a loop count of an inner loop is y or z and to perform 
double looping transformation. 

[17] The program conversion device according to Claim 10, 
10 wherein when an execution count of a loop is non-fixed, said 

loop structure transforming unit is operable to judge the execution 
count of the loop when the loop is executed and to perform double 
looping transformation so as to dynamically vary an iteration count 
in accordance with a judgment result. 

15 

[18] The program conversion device according to Claim 10, further 
comprising 

a receiving unit operable to receive information showing that 
arrays are aligned to a cache line size, 
20 wherein said instruction placing unit is operable to place a 

prefetch instruction in the loop, whose iteration count is x, for 
prefetching data stored one cache line ahead of data to be 
referenced within the iteration processing of the loop that is 
executed x number of times. 

25 

[19] The program conversion device according to Claim 10, 

wherein said optimization directive information receiving unit 
is operable to receive information showing a relative position in a 
cache line, from which the array starts to access, 
30 said loop structure transforming unit is operable to perform 

the double looping transformation in accordance with the 
information. 
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[20] The program conversion device according to Claim 10, 

wherein when the arrays are not aligned to the cache line size, 
said instruction placing unit is operable to place a prefetch 
5 instruction in the loop, whose Iteration count is x, for prefetching 
data stored two cache lines ahead of data to be referenced within 
the iteration processing of the loop that is executed x number of 
times. 

10 [21] The program conversion device according to Claim 10, 

wherein when the arrays are not aligned to the cache line size, 
said loop structure transforming unit is operable to judge a relative 
position in a cache line, from which the array starts to access, and 
operable to perform double looping transformation in accordance 

15 with a judgment result. 

[22] The program conversion device according to Claim 10, further 
comprising 

a receiving unit operable to receive information that relates to 
20 a focused array, 

wherein said loop structure transforming unit is operable to 
perform double looping transformation only on the focused array. 

[23] The program conversion device according to Claim 1, 
25 wherein said loop structure transforming unit is operable to 

further perform double looping transformation on an outer loop, 
considering an innermost loop as one block. 

[24] A program conversion method for a processor which has an 
30 instruction set including an instruction that waits for a 
predetermined response from an outside source when the 
instruction is executed, comprising: 
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a step of performing double looping transformation so as to 
transform a structure of a loop, which is included in an input 
program and whose iteration count is x, into a nested structure 
where a loop whose iteration count is y is an inner loop and a loop 
5 whose Iteration count is x/y is an outer loop; and 

a step of converting the Input program Into an output program 
including the Instruction by placing the Instruction In a position 
outside the inner loop. 

10 [25] A program realizing a program conversion method for a 
processor which has an instruction set Including an Instruction that 
waits for a predetermined response from an outside source when the 
instruction is executed, the program causing a computer to execute: 
a step of performing double looping transformation so as to 

15 transform a structure of a loop, which is included in an input 
program and whose iteration count is x, into a nested structure 
where a loop whose Iteration count is y is an inner loop and a loop 
whose iteration count is x/y is an outer loop; and 

a step of converting the input program into an output program 

20 Including the Instruction by placing the instruction In a position 
outside the Inner loop. 
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